on-the-fly operation batching
On-the-fly Operation Batching in Dynamic Computation Graphs
Dynamic neural networks toolkits such as PyTorch, DyNet, and Chainer offer more flexibility for implementing models that cope with data of varying dimensions and structure, relative to toolkits that operate on statically declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing toolkits - both static and dynamic - require that the developer organize the computations into the batches necessary for exploiting high-performance data-parallel algorithms and hardware. This batching task is generally difficult, but it becomes a major hurdle as architectures become complex. In this paper, we present an algorithm, and its implementation in the DyNet toolkit, for automatically batching operations. Developers simply write minibatch computations as aggregations of single instance computations, and the batching algorithm seamlessly executes them, on the fly, in computationally efficient batches. On a variety of tasks, we obtain throughput similar to manual batches, as well as comparable speedups over single-instance learning on architectures that are impractical to batch manually.
Reviews: On-the-fly Operation Batching in Dynamic Computation Graphs
Summary: The authors of this paper extend neural network toolkit DyNet with automatic operation batching. Batching enables efficient utilization of CPUs and GPUs by turning matrix-vector products into matrix-matrix products and reducing kernel launch overhead (for GPUs) but it is commonly done manually. Manual batching is manageable for simple feed-forward-networks but it becomes increasingly a headache as we explore more flexible models that take variable-length input, tree-structured input, or networks that perform dynamic control decisions. Chainer, DyNet, and PyTorch are recently proposed neural network toolkits that allow user to dynamically define the computation graph using the syntax of the host language (if, while, etc in python). This is desirable as it avoids tookit specific constructions (e.g., cond in TensorFlow) and make the network definition intuitive but it tends to limit performance because the network construction and computation happens at the same time.
On-the-fly Operation Batching in Dynamic Computation Graphs
Neubig, Graham, Goldberg, Yoav, Dyer, Chris
Dynamic neural networks toolkits such as PyTorch, DyNet, and Chainer offer more flexibility for implementing models that cope with data of varying dimensions and structure, relative to toolkits that operate on statically declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing toolkits - both static and dynamic - require that the developer organize the computations into the batches necessary for exploiting high-performance data-parallel algorithms and hardware. This batching task is generally difficult, but it becomes a major hurdle as architectures become complex. In this paper, we present an algorithm, and its implementation in the DyNet toolkit, for automatically batching operations. Developers simply write minibatch computations as aggregations of single instance computations, and the batching algorithm seamlessly executes them, on the fly, in computationally efficient batches.